Integration of biological data on transcriptome

نویسندگان

  • Laure Berti-Équille
  • Fouzia Moussouni
  • Anne Arcade
چکیده

A major concern in modern biology and medical research consists of the use of a “high flow” technology named bio-arrays or DNA chips that allows the study of thousands of genes simultaneously. The medical research institute, INSERM U522, specialized in the liver, uses the transcriptome techniques to diagnose liver disease states and to point the way towards new therapies. For this sake, the design of a bioinformatic integrated environment, named Gedaw (Gene Expression DAta Warehouse) has been initiated for storing, managing and analyzing such specific data. As an object-oriented data warehouse, it includes knowledge and complex data on genes expressed in the liver. The concept of ontology is the keystone of the application for integrating both genomic data available on public databanks, as well as experimental data on genes delivered from laboratory experiments and clinical statements. This paper describes the data modeling and processing that allow (i) to capture data from public databanks on genes (e.g., GenBank) (ii) to extract relevant information by selecting objects imported in XML format (iii) to make them persistent into the objectoriented warehouse. RÉSUMÉ: De nouvelles techniques d'analyse biologique, dites "à haut débit" génèrent une masse considérable de données qu'il est nécessaire d'organiser, de stocker et de gérer. L'unité de recherche U522 de l'INSERM, utilisant ces techniques pour l'étude du transcriptome hépatique, a initié le développement d'un environnement intégré nommé Gedaw (Gene Expression DAta Warehouse), dédié à la gestion, à l'intégration et à l'analyse de ces nombreuses données. Entrepôt de données orienté objet, il regroupe des connaissances et des données complexes sur les gènes du foie. Le concept d'ontologie, au centre de l'application, permet d'intégrer à la fois les données sur les séquences génomiques issues des banques de données publiques, ainsi que les données issues des expériences du laboratoire et des relevés cliniques. Cet article présente la problématique de l'intégration des données biologiques liées au transcriptome. Il décrit le modèle de données associé à ce cas d'application et implanté sous forme de classes d’objets persistants au sein d'une base orientée objet. La chaîne de traitements développée dans Gedaw permet : (i) d'intégrer les données génomiques à partir de banques publiques (telles que GenBank) (ii) d’extraire les informations pertinentes par sélection d'objets au format XML (iii) de les rendre persistantes dans l'entrepôt. 2 Ingénierie des Systèmes d'Information. Volume X n°X/2002 KEY-WORDS: data integration, biological data, ontology, object-oriented data warehouse, XML, transcriptome. MOTS-CLÉS: intégration, données biologiques, ontologie, entrepôt de données objet, XML, transcriptome.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Pathway-level Integration of Proteogenomic Data in Breast Cancer Using Independent Component Analysis

Recent advances in the multi-omics characterization necessitate pathway-level abstraction and knowledge integration across different data types. In this study, we apply independent component analysis (ICA) to human breast cancer proteogenomics data to retrieve mechanistic information. We show that as an unsupervised feature extraction method, ICA was able to construct signatures with known biol...

متن کامل

BiDIP: a Biological Data Integration Platform for Transcriptome Analysis

Many studies aimed to construct an automated gene expression analysis platform for researchers. However, they lack an integrated data model for analyzing heterogeneous data. In order to address this issue, we created a biological data integration platform for transcriptome analysis (BiDIP) for managing various kinds of databases. As part of this platform, we developed a biological interaction d...

متن کامل

Transcriptome analysis of the freshwater pearl mussel, Hyriopsis cumingii (Lea) using illumina paired-end sequencing to identify genes and markers

The transcriptome of triangle sail mussel Hyriopsis cumingii (Lea) using Illumina paired-end sequencing technology was conducted and analyzed. Equal quantities of total RNA isolated from six tissues, including gonad, hepatopancreas, foot, mantel, gill and adductor muscle, were pooled to construct a cDNA library. A total of 58.09 million clean reads with 98.48 % Q20 bases were generated. Cluster...

متن کامل

Transcriptome analysis of the freshwater pearl mussel, Hyriopsis cumingii (Lea) Uusing Illumina paired-end sequencing to identify genes and markers

The transcriptome of triangle sail mussel Hyriopsis cumingii (Lea) using Illumina paired-end sequencing technology was conducted and analyzed. Equal quantities of total RNA isolated from six tissues, including gonads, hepatopancreas, foot, mantel, gills and adductor muscles, were pooled to construct a cDNA library. A total of 58.09 million clean reads with 98.48 % Q20 bases were generated. Clus...

متن کامل

Transcriptome analysis of the freshwater pearl mussel, Hyriopsis cumingii (Lea) Uusing Illumina paired-end sequencing to identify genes and markers

The transcriptome of triangle sail mussel Hyriopsis cumingii (Lea) using Illumina paired-end sequencing technology was conducted and analyzed. Equal quantities of total RNA isolated from six tissues, including gonads, hepatopancreas, foot, mantel, gills and adductor muscles, were pooled to construct a cDNA library. A total of 58.09 million clean reads with 98.48 % Q20 bases were generated. Clus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Ingénierie des Systèmes d'Information

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2001